Analysis Summary
Pipeline Parameters
The following parameters were used for this analysis:
š Full parameters available at: {organism}_pipeline_parameters.json
| Parameter | Value | Description |
|---|---|---|
| Clustering Threshold | 0.015 (98.5% identity) |
Maximum genetic distance for grouping sequences into consensus genotypes. Lower values create more groups with tighter genetic similarity. |
| Similarity Threshold | 0.5 (50% identity) |
Minimum sequence identity required for assigning samples to genotypes. Samples below this threshold are marked as unassigned. |
| Tie Margin | 0.001 (0.1% difference) |
Maximum identity difference between top matches to flag as ambiguous. Samples with (best - runner-up) < tie margin are flagged for manual review. |
| Tie Threshold | 0.95 (95% identity) |
Minimum best-match identity required to consider tie detection. Prevents flagging low-quality matches as ties. |
| Threads | 4 |
Number of parallel processing threads used. |
| Phylogenetic Tree | True |
Whether phylogenetic tree was constructed. |
Visualizations
š High-resolution images (PNG/PDF) and data files (JSON/CSV) available in output directories (see individual plots for locations)
Identity Distribution
Distribution of sequence identity scores for assigned samples
š Files: genotype_assignments/Sphyrna lewini_identity_distribution.png, Sphyrna lewini_identity_distribution.pdf
Phylogenetic Tree
Phylogenetic tree showing relationships between consensus groups
š Files: phylogenetic/Sphyrna lewini_tree.png, Sphyrna lewini_tree.pdf
š” Newick tree file available at phylogenetic/Sphyrna lewini_tree_relabeled.nwk for opening in tree editors such as TreeViewer (Bianchini & SĆ”nchez-Baracaldo, 2024) for re-rooting and customization
Relative Abundance by Ocean Basin
Relative abundance of genotypes across ocean basins
š Files: visualization/Sphyrna lewini_distribution_bar.png, Sphyrna lewini_distribution_bar.pdf, Sphyrna lewini_distribution_bar_data.json
Total Abundance by Ocean Basin
Total sample counts of genotypes across ocean basins
š Files: visualization/Sphyrna lewini_totaldistribution_bar.png, Sphyrna lewini_totaldistribution_bar.pdf, Sphyrna lewini_totaldistribution_bar_data.json
Total Abundance by Ocean Basin (Faceted)
Total sample counts faceted by species or genotype
š Files: visualization/Sphyrna lewini_distribution_bar_faceted.png, Sphyrna lewini_distribution_bar_faceted.pdf, Sphyrna lewini_distribution_bar_faceted_data.json
Distribution Map
Geographic distribution of samples
š Files: visualization/Sphyrna lewini_distribution_map.png, Sphyrna lewini_distribution_map.pdf, Sphyrna lewini_distribution_map_data.json
Distribution Map (Faceted)
Geographic distribution faceted by species or genotype
š Files: visualization/Sphyrna lewini_distribution_map_faceted.png, Sphyrna lewini_distribution_map_faceted.pdf
Methods
Comprehensive analysis methodology and parameters suitable for reporting in peer-reviewed publications.
1. Analysis Overview
| Pipeline Version | BOLDGenotyper v1.0.0 |
| Analysis Date | 2025-11-20 |
| Input File | data/Sphyrna_lewini_scallopedhammerhead.tsv |
2. Sample Processing & Quality Control
| Total samples loaded | 685 |
| Duplicate samples removed | 10 |
| Unique samples after deduplication | 675 |
| Coordinate quality filtering | 617/675 (91.4%) retained |
| Centroid coordinates excluded | 58 |
| Valid sequences for analysis | 598 |
| Missing/invalid sequences excluded | 19 |
| Sequences too short after trimming | 3 |
3. Sequence Dereplication
| Algorithm | Hierarchical clustering (average linkage) |
| Alignment | MAFFT --auto --thread 4 |
| Trimming | trimAl |
| Minimum sequence length | 400 bp |
| Clustering threshold | 0.015 (98.5% identity) |
| Pairwise comparisons | 176,715 |
| Consensus genotypes identified | 10 |
4. Genotype Assignment
| Assignment method | Edit distance (edlib) |
| Similarity threshold | 0.5 (50% identity) |
| Tie detection margin | 0.001 (0.1%) |
| Tie detection threshold | 0.95 (95% identity) |
| Successfully assigned | 595/617 (96.4%) |
| Unassigned (no sequence) | 19 |
| Unassigned (below threshold) | 3 |
| Ambiguous assignments (ties) | 1 |
| Low confidence assignments | 0 |
5. Phylogenetic Analysis
| Alignment | MAFFT --auto --thread 4 |
| Tree inference | FastTree |
| FastTree parameters | -nt -gtr -gamma |
| Substitution model | GTR+Gamma |
| Number of taxa | 10 |
6. Geographic Analysis
| Reference dataset | GOaS v1 (Global Oceans and Seas) |
| Samples with coordinates | 171/617 (27.7%) |
| Ocean basin assignments | 71/171 (41.5%) |
| Outside known basins | 100 |
| Unknown location | 546 samples |
7. Software & Dependencies
| BOLDGenotyper | v1.0.0 |
| MAFFT | Multiple sequence alignment (Katoh & Standley, 2013) |
| trimAl | Alignment trimming (Capella-GutiƩrrez et al., 2009) |
| FastTree | Phylogenetic inference (Price et al., 2010) |
| edlib | Edit distance calculation (Å oÅ”iÄ & Å ikiÄ, 2017) |
| GOaS | Global Oceans and Seas dataset (Flanders Marine Institute, 2021) |
8. Methods Statement
DNA barcode sequences for Sphyrna lewini were downloaded from the Barcode of Life Data System (BOLD; Ratnasingham & Hebert, 2007) and processed using BOLDGenotyper v1.0.0. A total of 598 sequences were analyzed after removing duplicates and filtering for sequence quality (minimum length: 400 bp). Sequences were aligned using MAFFT (Katoh & Standley, 2013) and trimmed with trimAl (Capella-GutiƩrrez et al., 2009). Consensus genotypes were identified through hierarchical clustering at 98% sequence identity using average linkage. Individual sequences were assigned to genotypes using edit distance calculations (minimum identity: 50%). A phylogenetic tree was constructed using FastTree (Price et al., 2010) with the GTR+Gamma substitution model. Geographic distributions were mapped using coordinates provided in BOLD and assigned to ocean basins using the Global Oceans and Seas (GOaS) v1 dataset (Flanders Marine Institute, 2021). 595 sequences (96.4%) were successfully assigned to 10 consensus genotypes.
9. References
Bianchini, G., & SƔnchez-Baracaldo, P. (2024). TreeViewer: Flexible, modular software to visualise and manipulate phylogenetic trees. Ecology and Evolution, 14, e10873. https://doi.org/10.1002/ece3.10873
Capella-GutiĆ©rrez, S., Silla-MartĆnez, J. M., & Gabaldón, T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 1972-1973.
Flanders Marine Institute (2021). Global Oceans and Seas, version 1. Available online at https://www.marineregions.org/
Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution, 30(4), 772-780.
Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2 ā Approximately maximum-likelihood trees for large alignments. PLoS ONE, 5(3), e9490.
Ratnasingham, S., & Hebert, P. D. (2007). BOLD: The Barcode of Life Data System. Molecular Ecology Notes, 7(3), 355-364.
Å oÅ”iÄ, M., & Å ikiÄ, M. (2017). Edlib: a C/C++ library for fast, exact sequence alignment using edit distance. Bioinformatics, 33(9), 1394-1395.
Genotype Assignment Results
š Full assignment data available at: reports/{organism}_assignment_summary.csv and genotype_assignments/{organism}_diagnostics.csv
Assignment Status Breakdown
| Status | Count | Percentage |
|---|---|---|
| Successfully Assigned | 594 | 96.3% |
| Low Confidence | 0 | 0.0% |
| Tied Assignment | 1 | 0.2% |
| Below Threshold | 3 | 0.5% |
| No Sequence Data | 19 | 3.1% |
Identity Score Statistics (Assigned Samples)
| Metric | Value |
|---|---|
| Mean Identity | 98.66% |
| Median Identity | 99.69% |
| Minimum Identity | 76.61% |
| Maximum Identity | 100.00% |
Taxonomy
Consensus Group Taxonomy
š Full table available at: taxonomy/{organism}_consensus_taxonomy.csv
| consensus_group | assigned_sp | assignment_level | assignment_notes | majority_fraction |
|---|---|---|---|---|
| consensus_c10_n1 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c1_n211 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c2_n1 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c3_n146 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c4_n1 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c5_n1 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c6_n5 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c7_n101 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c8_n127 | Sphyrna lewini | species | majority 1.00 | 1.0 |
| consensus_c9_n1 | Sphyrna lewini | species | majority 1.00 | 1.0 |
Species Composition by Consensus Group
š Full table available at: taxonomy/{organism}_species_by_consensus.csv
| consensus_group | reported_species | n | frac | n_in_group |
|---|---|---|---|---|
| consensus_c10_n1 | Sphyrna lewini | 1 | 1.0 | 1.0 |
| consensus_c1_n211 | Sphyrna lewini | 211 | 1.0 | 211.0 |
| consensus_c2_n1 | Sphyrna lewini | 1 | 1.0 | 1.0 |
| consensus_c3_n146 | Sphyrna lewini | 146 | 1.0 | 146.0 |
| consensus_c4_n1 | Sphyrna lewini | 1 | 1.0 | 1.0 |
| consensus_c5_n1 | Sphyrna lewini | 1 | 1.0 | 1.0 |
| consensus_c6_n5 | Sphyrna lewini | 5 | 1.0 | 5.0 |
| consensus_c7_n101 | Sphyrna lewini | 101 | 1.0 | 101.0 |
| consensus_c8_n127 | Sphyrna lewini | 127 | 1.0 | 127.0 |
| consensus_c9_n1 | Sphyrna lewini | 1 | 1.0 | 1.0 |
| NaN | Sphyrna lewini | 22 | NaN | NaN |
Geographic Distribution
š Full annotated dataset with geographic data available at: {organism}_annotated.csv
Sample Distribution by Ocean Basin
| Ocean Basin | Sample Count | Percentage |
|---|---|---|
| South China and Easter Archipelagic Seas | 39 | 54.9 |
| Indian Ocean | 21 | 29.6 |
| South Atlantic Ocean | 4 | 5.6 |
| North Atlantic Ocean | 3 | 4.2 |
| South Pacific Ocean | 3 | 4.2 |
| North Pacific Ocean | 1 | 1.4 |
Genotypes per Ocean Basin
| ocean_basin | consensus_group | Indian Ocean | North Atlantic Ocean | North Pacific Ocean | South Atlantic Ocean | South China and Easter Archipelagic Seas | South Pacific Ocean |
|---|---|---|---|---|---|---|---|
| consensus_c10_n1 | 0 | 1 | 0 | 0 | 0 | 0 | |
| consensus_c1_n211 | 17 | 0 | 0 | 3 | 2 | 0 | |
| consensus_c3_n146 | 3 | 0 | 0 | 1 | 0 | 1 | |
| consensus_c6_n5 | 0 | 2 | 0 | 0 | 0 | 0 | |
| consensus_c7_n101 | 1 | 0 | 1 | 0 | 8 | 2 | |
| consensus_c8_n127 | 0 | 0 | 0 | 0 | 29 | 0 |